Speech synthesizing apparatus and method, and storage medium therefor

ABSTRACT

A speech synthesizing apparatus for synthesizing a speech waveform stores speech data, which is obtained by adding attribute information onto phoneme data, in a database. In accordance with prescribed retrieval conditions, a phoneme retrieval unit retrieves phoneme data from the speech data that has been stored in the database and retains the retrieved results in a retrieved-result storage area. A processing unit for assigning a power penalty and a processing unit for assigning a phoneme-duration penalty assign the penalties, on the basis of power and phoneme duration constituting the attribute information, to a set of phoneme data stored in the retrieved-result storage area. A processing unit for determining typical phoneme data performs sorting on the basis of the assigned penalties and, based upon the stored results, selects phoneme data to be employed in the synthesis of a speech waveform.

BACKGROUND OF THE INVENTION

This invention relates to an speech synthesizing apparatus having adatabase for managing phoneme data, in which the apparatus performsspeech synthesis using the phoneme data managed by the database. Theinvention further relates to a method of synthesizing speech using thisapparatus, and to a storage medium storing a program for implementingthis method.

A method of speech synthesis which concatenates waveform (which will bereferred to as the “Concatenative synthesis method” below) is availablein the prior art as a method of synthesizing speech. The Concatenativesynthesis method changes prosody with a Pitch synchronous overlap addingmethod (P-SOLA) which changes prosody by placing pitch waveform unitsextracted from the original waveform unit in conformity with a desiredpitch timing. An advantage of the Concatenative synthesis method is thatthe synthesized speech obtained is more natural than that provided by asynthesis method based upon parameters. A disadvantage is that theallowable range for the change in prosody is narrow.

Accordingly, sound quality is improved by preparing speech data of awide variety of variations, selecting these properly and using them.Information such as the phoneme environment (the phoneme that is theobject of synthesis or several phonemes including both sides thereof)and the fundamental frequency F₀ is used as the criteria for selectingthe synthesis unit.

However, the conventional method of synthesizing speech described aboveinvolves a number of problems.

By way of example, if a database contains a plurality of items ofphoneme data which satisfy a certain phoneme environment and thefundamental frequency F₀, the phoneme unit used in synthesis is onephoneme unit (e.g., the phoneme unit that appears in the database first)selected randomly from these items of phoneme data. Since the databaseis a collection of speech uttered by human beings, all of the phonemedata is not necessarily stable (i.e., not necessarily of good quality).The database may contain phoneme data that is the result of mumbling, ahalting voice, slowness of speech or hoarseness. If one item of phonemedata is selected randomly from such a collection of data, naturallythere is the possibility that sound quality will decline whensynthesized speech is generated.

SUMMARY OF THE INVENTION

Accordingly, an object of the present invention is to provide a speechsynthesizing apparatus and method capable of appropriately selectingphoneme data used in speech synthesis and of suppressing any decline insound quality in speech synthesis, as well as a storage medium storing aprogram for implementing this method.

According to one aspect of the present invention, the foregoing objectis attained by providing a speech synthesizing apparatus comprising:storage means for storing plural items of phoneme data; retrieval meansfor retrieving phoneme data, in accordance with given retrievalconditions, from the plural items of phoneme data stored in the storagemeans; penalty assigning means for assigning a penalty that is basedupon an attribute value to each item of phoneme data retrieved by theretrieval means; and selection means for selecting, from the phonemedata retrieved by the retrieval means, and based upon the penaltyassigned by the penalty assigning means, phoneme data to be employed insynthesis of a speech waveform.

According to another aspect of the present invention, the foregoingobject is attained by providing a speech synthesizing method comprising:a storage step of storing plural items of phoneme data; a retrieval stepof retrieving phoneme data, in accordance with given search retrievalconditions, from the plural items of phoneme data stored at the storagestep; a penalty assigning step of assigning a penalty that is based uponan attribute value to each item of phoneme data retrieved at theretrieval step; and a selection step of selecting, from the phoneme dataretrieved at the retrieval step, and based upon the penalty assigned atthe penalty assigning step, phoneme data employed in synthesis of aspeech waveform.

The present invention further provides a storage medium storing acontrol program for causing a computer to implement the method ofsynthesizing speech described above.

Other features and advantages of the present invention will be apparentfrom the following description taken in conjunction with theaccompanying drawings, in which like reference characters designate thesame or similar parts throughout the figures thereof.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute apart of the specification, illustrate embodiments of the invention and,together with the description, serve to explain the principles of theinvention.

FIG. 1 is a block diagram showing the construction of a speechsynthesizing apparatus according to a first embodiment of the presentinvention;

FIG. 2 is a block diagram illustrating functions relating to phonemedata selection processing according to the first embodiment;

FIG. 3 is a flowchart illustrating a procedure relating to phoneme dataselection processing according to the first embodiment;

FIG. 4 is a block diagram illustrating functions relating to phonemedata selection processing according to the second embodiment;

FIG. 5 is a flowchart illustrating a procedure relating to phoneme dataselection processing according to the second embodiment; and

FIG. 6 is a flowchart useful in describing an overview of speechsynthesizing processing.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

Preferred embodiments of the present invention will now be described indetail in accordance with the accompanying drawings.

[First Embodiment]

FIG. 1 is a block diagram illustrating the construction of a speechsynthesizing apparatus according to a first embodiment of the presentinvention.

As shown in FIG. 1, the apparatus includes a control memory (ROM) 101which stores a control program for causing a computer to implementcontrol in accordance with a control procedure shown in FIG. 3, acentral processing unit 102 for executing processing such as decisionsand calculations in accordance with the control procedure retained inthe control memory 101, and a memory (RAM) 103 which provides a workarea for when the central processing unit 102 executes various controloperations. Allocated to the memory 103 are an area 202 for holding theresults of phoneme retrieval, an area 204 for holding the results ofpenalty assignment, an area 207 for holding the results of sorting, andan area 209 for holding representative phoneme data. These areas will bedescribed later with reference to FIG. 2. The apparatus further includesa disk device 104 which, in this embodiment, is a hard disk. The diskdevice 104 stores a database 200 described later with reference to FIG.2. The data of database 200 is stored in memory 103 when the data isused. A bus 105 connects the components mentioned above.

The speech synthesizing apparatus of this embodiment uses informationsuch as the phoneme environment and fundamental frequency to select theappropriate phoneme data from speech data that has been recorded in thedatabase 200 (FIG. 2) and performs waveform editing synthesis employingthe selected data.

FIG. 6 is a flowchart illustrating an overview of speech synthesizingprocessing according to this embodiment. The phoneme environment andfundamental frequency of a phoneme to be used are specified at step S11in FIG. 6. This may be carried out by storing the phoneme environmentand fundamental frequency in the disk device 104 as a parameter file orby entering them via a keyboard. Next, at step S12, phoneme data to beused is selected from the database 200. This is followed by step S13, atwhich it is determined whether phoneme data to be selected exists.Control returns to step S11 if such data exists. If it is determinedthat all necessary phoneme data has been selected, on the other hand,control proceeds from step S13 to step S14 and speech synthesis bywaveform editing is executed using the selected phoneme data.

The details of processing for selecting the phoneme data at step S12will now be described. In the case described below, selection of phonemedata is carried out using the phoneme environment (three phonemesconposed of the phoneme of interest and one phoneme on each sidethereof, these being referred to as a so-called “triphone”) and theaverage fundamental frequency of the phoneme as criteria for selectingphoneme data.

FIG. 2 is a block diagram illustrating functions relating to phonemedata selection processing for selecting the optimum phoneme data from aset of phoneme data in which the phoneme environments and fundamentalfrequencies are identical. The functions are those of a speechsynthesizing apparatus according to the first embodiment.

The database 200 in FIG. 2 stores speech data in which a phonemeenvironment, phoneme boundary and fundamental frequency, power andphoneme duration are have been assigned to each item of phoneme data. Aphoneme retrieval unit 201 retrieves phoneme data, which satisfies aspecific phoneme environment and fundamental frequency, from thedatabase 200. The area 202 stores a set of phoneme data, namely theresults of retrieval performed by the phoneme retrieval unit 201. Apower-penalty assignment processing unit 203 assigns a penalty relatedto power to each item of phoneme data of the set of phoneme data storedin the area 202. The area 204 holds the results of the assignment ofpenalties to the phoneme data. A duration-penalty assignment processingunit 205 assigns a penalty relating to phoneme duration to each items ofphoneme data.

A sorting processing unit 206 subjects the set of phoneme data tosorting processing regarding specific information (power or phonemeduration, etc.) when a penalty is assigned. The area 207 holds theresults of sorting. In regard to the results obtained by assigningpenalties, a data determination processing unit 208 selects phoneme datahaving the smallest penalty as representative phoneme data. The area 209holds the representative phoneme data that has been decided.

From the speech synthesizing processing set forth above, processing forselecting phoneme data implemented by the above-described functionalarrangement will be discussed next. FIG. 3 is a flowchart illustrating aprocedure relating to phoneme data selection processing for selectingthe optimum phoneme data from the set of phoneme data having identicalphoneme environments and fundamental frequencies.

First, at step S301, all phoneme data that satisfies the phonemeenvironment (triphone) and fundamental frequency F₀ that were specifiedat step S11 is extracted from the database 200 and is stored in area202. Next, at step S302, the power-penalty assignment processing unit203 assigns power-related penalties to the set of phoneme data that hasbeen stored in area 202.

The guideline involving power-related penalties is to assign largepenalties to phoneme data having power values that depart from anaverage value of power because the goal is to select phoneme data havingan average value of power within the set of phoneme data. Thepower-penalty assignment processing unit 203 instructs the sortingprocessing unit 206 to sort the phoneme data set, which has beenextracted from the area 202 that holds the results of retrieval, basedupon values of power. Power referred to here may be the power of thephoneme data or the average power per unit of time.

The sorting processing unit 206 responds by sorting the phoneme data setbased upon power and storing the results in the area 207 that is forretaining the results of sorting. The power-penalty assignmentprocessing unit 203 waits for sorting to end and then assigns a penaltyto the sorted phoneme data that has been stored in area 207. A penaltyis assigned in accordance with the guideline mentioned above. Forexample, among items of phoneme data that have been sorted in order ofdecreasing power, a penalty (e.g., 2.0 points) is added onto phonemedata whose power values fall within the smaller one-third of values andonto phoneme data whose power values fall within the larger one-third ofvalues. In other words, a penalty is assigned to phoneme data other thanthe middle one-third of phoneme data.

Next, at step S303, the duration-penalty assignment processing unit 205assigns a penalty relating to phoneme duration through a proceduresimilar to that of the power-penalty assignment processing unit 203.Specifically, the duration-penalty assignment processing unit 205instructs the sorting processing unit 206 to perform sorting based uponphoneme duration and stores the results in area 207. On the basis of thesorted results, the duration-penalty assignment processing unit 205 addsa penalty (e.g., 2.0 points) onto phoneme data whose phoneme durationsfall within the smaller one-third of durations and onto phoneme datawhose phoneme durations fall within the larger one-third of durations.The results obtained by the assignment of the penalty are retained inarea 204. Control then proceeds to step S304.

Step S304 calls for the data determination processing unit 208 todetermine a representative phoneme unit in terms of the phonemeenvironment and fundamental frequency currently of interest. Here theset of phoneme data assigned penalty based upon power and phonemeduration, stored in area 204, are delivered delivered to the sortingprocessing unit 206 and the sorting processing unit 206 is instructed tosort the results by penalty value. The sorting processing unit 206performs sorting on the basis of the two types of penalties relating topower and phoneme duration (e.g., using the sum of the two penaltyvalues) and stores the sorted results in area 207. When sortingprocessing ends, the data determination processing unit 208 selectsphoneme data having the smallest penalty and stores it in area 209 forthe purpose of employing this data as representative phoneme data. If aplurality of phoneme units having the minimum penalty value appear, thedata determination processing unit 208 selects the phoneme unit locatedat the head of the sorted results. This is equivalent to selecting onephoneme unit randomly from those having the smallest penalty.

Thus, in accordance with the first embodiment, the optimum phoneme datais selected, based upon a penalty relating to power and a penaltyrelating to phoneme duration, from a phoneme data set in which thephoneme environments and fundamental frequencies are identical.

[Second Embodiment]

The first embodiment has been described in regard to a case where thephoneme environment (the “triphone”, namely the phoneme of interest andone phoneme on each side thereof) and the average fundamental frequencyF₀ of the phoneme are used as criteria for selecting phoneme data.However, in instances where the triphone of a combination not containedin the database is required, the need arises to use an alternate“left-phone”. (a phoneme environment comprising the phoneme of interestand the phoneme to its left), “right-phone” (a phoneme environmentcomprising the phoneme of interest and the phoneme to its right) or“phone” (the phoneme of interest alone). In the second embodiment,therefore, there will be described a case where selection of phonemedata other than a specified triphone (such selected phoneme data will bereferred to as a “triphone substitute”) is taken into account.

FIG. 4 is a block diagram illustrating functions relating to phonemedata selection processing for selecting the optimum phoneme data from aset of phoneme data in which the phoneme environments and fundamentalfrequencies are identical. The functions are those of a speechsynthesizing apparatus according to the second embodiment. Thisembodiment differs from the first embodiment in FIG. 2 in that theapparatus further includes a processing unit for assigningelement-number penalty. Other areas or units 400 to 409 correspond tothe areas or units 200 to 209, respectively, of FIG. 2. The processingunit 410 assigns a penalty in dependence upon the number of elements ina set of phoneme data.

The speech synthesizing processing includes a procedure relating tophoneme data selection processing, which is implemented by theabove-described functional blocks, for selecting optimum phoneme datafrom a set of phoneme data having identical phoneme environments andfundamental frequencies. This procedure will now be described. FIG. 5 isa flowchart illustrating a procedure according to the second embodimentrelating to phoneme data selection processing for selecting the optimumphoneme data from the set of phoneme data having identical phonemeenvironments and fundamental frequencies.

Steps S501 to S503 are similar to steps S301 to S303 (FIG. 3) in thefirst embodiment. It should be noted that if a specified triphone doesnot exist in the database, the triphone retrieval at step S501 involvesthe retrieval of the alternate candidates left-phone, right-phone orphone (the aforesaid “triphone substitute”). In this case, for example,firstly, retrieval of left-phone is carried out. If the left-phone doesnot exist in the database, then retrieval of right-phone is carried out.If the right-phone does not exist, then retrieval of phone is carriedout. Alternatively, the sequence of retrieval may be different betweenvowel and consonant. For example, as for vowel, the retrieval is carriedout in the sequence of left-phone, right-phone and phone. As forconsonant, the retrieval is carried out in the sequence of right-phone,left-phone and phone.

In the second embodiment, use of a triphone substitute means that aspecified triphone does not exist. As long as a specified triphone iscontained in the database, however, this triphone is adopted. At stepS504, therefore, it is determined whether a triphone substitute has beenobtained as the result of retrieval. If a triphone substitute has notbeen obtained, i.e., if the specified triphone has been obtained,control skips step S505 and proceeds to step S506. When the specifiedtriphone is retrieved, therefore, processing similar to that of thefirst embodiment is executed. If it is determined at step S504 that atriphone substitute has been retrieved, on the other hand, controlproceeds to step S505. Here the processing unit 505 assigns a penalty independence upon the numbers of elements in the set of phoneme data. In acase where the specified triphone is absent, the processing unit 505counts the numbers of elements contained in the phoneme data set, thecount being performed per each triphone phoneme environment group (agroup classified by the environment comprising the phoneme concerned andone phoneme on each side thereof) of the alternate candidate left-phone(or right-phone or phone). In this embodiment, if the number of items ofphoneme data of an applicable triphone phoneme environment is small (twoor less), then the processing unit 505 adds a penalty (0.5 points) ontoall of the phoneme data concerned. In other words, the processing unit505 judges that data having only a low frequency of appearance in asufficiently large database is not reliable.

For example, consider a case where a triphone t.A.k does not exist inthe database and is to be replaced by a left-phone t.A.*. If twotriphones t.A.p and 20 triphones t.A.t exist in the database, allocatinga triphone substitute, which is to replace the triphone t.A.k, fromamong triphones t.A.t of which 20 exist will provided a higherprobability of obtaining phoneme data of good quality.

If a penalty based upon number of elements is thus assigned, the resultis stored in area 504, which is for holding the results of penaltyassignment, and then control proceeds to step S506. Step S506 involvesprocessing equivalent to that of step S304 in the first embodiment. Inthe second embodiment, a penalty based upon number of elements isassigned in addition to the penalty based upon power and the penaltybased upon phoneme duration. As a result, phoneme data is selected upontaking all of these three penalties into consideration. In a case wherea specific triphone is retrieved and processing proceeds directly fromstep S504 to step S506, penalty based upon number of elements is nottaken into account.

Thus, in accordance with the second embodiment, it is possible to selectthe proper phoneme data inclusive of triphones that can be alternates.

In the embodiments set forth above, a case has been described in whichpenalty assignment processing is executed in order of power penalty andphoneme-duration penalty (and then element-number penalty in the secondembodiment). However, this does not impose a limitation upon the presentinvention, for the processing may be executed in any order. Further, anarrangement may be adopted in which these penalty assignment processingoperations are executed concurrently.

Further, in each of the foregoing embodiments, 2.0 points is adopted asthe penalty value for the power and phoneme-duration penalties. However,this does not impose a limitation upon the present invention, for it isobvious that a suitable value may be set. In addition, equal penaltiesneed not be applied as the penalties relating to both characteristics.

In the second embodiment, a case in which 0.5 is set as the value of theelement-number penalty is described. However, this does not impose alimitation upon the present invention, for a suitable value may be set.

Furthermore, in each of the foregoing embodiments, a case is describedin which a penalty is assigned to the one-third of phoneme data startingfrom smaller values (or to the one-third of phoneme data starting fromlarger values) in regard to the sorted results. However, this does notimpose a limitation upon the present invention. For example, it ispossible to change the method of penalty assignment depending upon thenumber of items of phoneme data or the properties of the phoneme datacontained in the database. In such case a penalty may be assigned todata for which the difference relative to an average value is greaterthan a threshold value.

Further, in the foregoing embodiments, there is described a method ofselecting representative phoneme data in which the target is a phonemedata set that satisfies a specific phoneme environment and fundamentalfrequency. However, this does not impose a limitation upon the presentinvention. For example, it is possible to use a phoneme data set forwhich the matter of interest is solely the phoneme environment and toadopt the fundamental frequency as a factor for assigning a penalty.

Further, in each of the above embodiments, there is described a methodof selecting a representative phoneme unit on demand, wherein the targetis a phoneme data set that satisfies a specific phoneme environment andfundamental frequency. However, an arrangement may be adopted in which aphoneme lexicon obtained by applying the processing of the firstembodiment in advance is created based upon all conceivable phonemeenvironments and fundamental frequencies.

Further, in each of the foregoing embodiments, a case in which thesorting processing unit and the area for holding the sorted results aredesigned for general-purpose use. However, this does not impose alimitation upon the present invention. For example, an arrangement maybe adopted in which there is provided a sorting processor exclusivelyfor the processing unit that assigns the power penalties and a sortingprocessor exclusively for the processing unit that assigns thephoneme-duration penalties.

In each of the foregoing embodiments, a case in which the areas forstoring data are implemented by memory (RAM) is described. However, thisdoes not impose a limitation upon the present invention because anystorage media may be used.

Further, in each of the foregoing embodiments, a case in which thecomponents are constituted by the same computer is described. However,this does not impose a limitation upon the present invention becausethese components may be implemented by computers or processorsdistributed over a network.

Further, in each of the foregoing embodiments, a case in which a programis stored in a control memory (ROM) is described. However, this does notimpose a limitation upon the present invention because the program maybe stored in any storage media. The same operations performed by theprogram may be carried out by circuitry.

The present invention can be applied to a system constituted by aplurality of devices or to an apparatus comprising a single device(e.g., a copier or facsimile machine, etc.).

Furthermore, it goes without saying that the invention is applicablealso to a case where the object of the invention is attained bysupplying a storage medium storing the program codes of the software forperforming the functions of the foregoing embodiment to a system or anapparatus, reading the program codes with a computer (e.g., a CPU orMPU) of the system or apparatus from the storage medium, and thenexecuting the program codes.

In this case, the program codes read from the storage medium implementthe novel functions of the invention, and the storage medium storing theprogram codes constitutes the invention.

Further, the storage medium, such as a floppy disk, hard disk, opticaldisk, magneto-optical disk, CD-ROM, CD-R, magnetic tape, non-volatiletype memory card or ROM can be used to provide the program codes.

Furthermore, besides the case where the aforesaid functions according tothe embodiment are implemented by executing the program codes read by acomputer, it goes without saying that the present invention covers acase where an operating system or the like running on the computerperforms a part of or the entire process in accordance with thedesignation of program codes and implements the functions according tothe embodiments.

It goes without saying that the present invention further covers a casewhere, after the program codes read from the storage medium are writtenin a function expansion board inserted into the computer or in a memoryprovided in a function expansion unit connected to the computer, a CPUor the like contained in the function expansion board or functionexpansion unit performs a part of or the entire process in accordancewith the designation of program codes and implements the function of theabove embodiment.

Thus, in accordance with the present invention, as described above, itis possible to provide a speech synthesizing apparatus capable ofselecting better phoneme units, as a result of which synthesized speechof superior quality can be produced. The invention provides also amethod of controlling this apparatus and a storage unit storing aprogram for implementing this control method.

As many apparently widely different embodiments of the present inventioncan be made without departing from the spirit and scope thereof, it isto be understood that the invention is not limited to the specificembodiments thereof except as defined in the appended claims.

1. A speech synthesizing apparatus comprising: storage means for storingplural items of phoneme data; retrieval means for retrieving phonemedata, in accordance with given retrieval conditions, from the pluralitems of phoneme data stored in said storage means; first penaltyassigning means for assigning a penalty that is based upon an attributevalue to each item of phoneme data retrieved by said retrieval means;and selection means for selecting, from the phoneme data retrieved bysaid retrieval means, and based upon the penalty assigned by said firstpenalty assigning means, phoneme data to be employed in synthesis of aspeech waveform, wherein the attribute values include power and phonemeduration of each item of phoneme data, and said first penalty assigningmeans assigns a power-related penalty in such a manner that a smallpenalty is assigned to phoneme data whose power is close to an averagevalue of the power, and assigns a phoneme-duration-related penalty insuch a manner that a small penalty is assigned to phoneme data whosephoneme duration is close to an average value of the phoneme duration.2. The apparatus according to claim 1, wherein said storage means storesrespective items of attribute information together with the plural itemsof phoneme data, and said first penalty assigning means obtains anattribute value from the attribute information stored in said storagemeans.
 3. The apparatus according to claim 2, wherein the attributeinformation includes phoneme environment, phoneme boundary, fundamentalfrequency, power and phoneme duration.
 4. The apparatus according toclaim 1, wherein said retrieval means retrieval means retrieves phonemedata that satisfies a specified phoneme environment.
 5. The apparatusaccording to claim 1, wherein said retrieval means retrieves phonemedata that satisfies a specified phoneme environment and fundamentalfrequency.
 6. The apparatus according to claim 1, wherein said firstpenalty assigning means sorts retrieved phoneme data based upon aprescribed attribute value and assigns a penalty value on the basis oforder obtained by sorting.
 7. A speech synthesizing apparatuscomprising: storage means for storing plural items of phoneme data;retrieval means for retrieving phoneme data, in accordance with givenretrieval conditions, from the plural items of phoneme data stored insaid storage means; first penalty assigning means for assigning apenalty that is based upon an attribute value to each item of phonemedata retrieved by said retrieval means; and selection means forselecting, from the phoneme data retrieved by said retrieval means, andbased upon the penalty assigned by said first penalty assigning means,phoneme data to be employed in synthesis of a speech waveform, whereinsaid first penalty assigning means: sorts the items of phoneme data inorder of decreasing power and assigns a power-related penalty on thebasis of the order obtained by sorting, in such a manner that a smallpenalty is assigned to phoneme data whose power is close to an averagevalue; and sorts the items of phoneme data in order of decreasingphoneme duration and assigns a phoneme-duration-related penalty on thebasis of the order obtained by sorting, in such a manner that a smallpenalty is assigned to phoneme data whose phoneme duration is close toan average value.
 8. A speech synthesizing apparatus comprising: storagemeans for storing plural items of phoneme data; retrieval means forretrieving phoneme data, in accordance with given retrieval conditions,from the plural items of phoneme data stored in said storage means;first penalty assigning means for assigning a penalty that is based uponan attribute value to each item of phoneme data retrieved by saidretrieval means; selection means for selecting, from the phoneme dataretrieved by said retrieval means, and based upon the penalty assignedby said first penalty assigning means, phoneme data to be employed insynthesis of a speech waveform; alternate retrieval means for retrievingphoneme data that satisfies some of the retrieval conditions in saidretrieval means does not exist counting means for grouping phoneme data,which has been retrieved by said alternate retrieval means, on the basisof a phoneme environment, and counting the items of phoneme data on aper-group basis; and second penalty assigning means for assigning apenalty on the basis of a count obtained by said counting means to thephoneme data retrieved by said alternate retrieval means, this penaltybeing assigned in addition to the penalty assigned by said first penaltyassigning means.
 9. The apparatus according to claim 8, wherein theretrieval conditions include phoneme environment; and said alternateretrieval means retrieves phoneme data which agrees with part of aphoneme environment specified in the retrieval conditions.
 10. Theapparatus according to claim 9, wherein the phoneme environmentspecified in the retrieval conditions is a triphone composed of anapplicable phoneme and phonemes on both sides thereof; and saidalternate retrieval means retrieves phoneme data for which theapplicable phoneme and its left side phoneme agree with the retrievalconditions, or phoneme data for which the applicable phoneme and itsright side phoneme agree with the retrieval conditions.
 11. A speechsynthesizing method comprising: a storage step of storing plural itemsof phoneme data; a retrieval step of retrieving phoneme data, inaccordance with given search retrieval conditions, from the plural itemsof phoneme data stored at said storage step; a first penalty assigningstep of assigning a penalty that is based upon an attribute value toeach item of phoneme data retrieved at said retrieval step; and aselection step of selecting, from the phoneme data retrieved at saidretrieval step, and based upon the penalty assigned at said penaltyassigning step, phoneme data employed in synthesis of a speech waveform,wherein the attribute values include power and phoneme duration of eachitem of phoneme data, and in the first penalty assigning step, apower-related penalty is assigned in such a manner that a small penaltyis assigned to phoneme data whose power is close to an average value ofthe power, and a phoneme-duration-related penalty is assigned in such amanner that a small penalty is assigned to phoneme data whose phonemeduration is close to an average value of the phoneme duration.
 12. Themethod according to claim 11, wherein said storage step storesrespective items of attribute information together with the plural itemsof phoneme data; and said first penalty assigning step obtains anattribute value from the attribute information stored at said storagestep.
 13. The method according to claim 12, wherein the attributeinformation includes phoneme label, phoneme boundary, fundamentalfrequency, power and phoneme duration.
 14. The method according to claim11, wherein said retrieval step retrieves phoneme data that satisfies aspecified phoneme environment.
 15. The method according to claim 11,wherein said retrieval step retrieves phoneme data that satisfies aspecified phoneme environment and fundamental frequency.
 16. The methodaccording to claim 11, wherein said fist penalty assigning step sotsretrieved phoneme data based upon a prescribed attribute value andassigns a penalty value on the basis of order obtained by sorting.
 17. Aspeech synthesizing method comprising: a storage step of storing pluralitems of phoneme data; a retrieval step of retrieving phoneme data, inaccordance with given search retrieval conditions, from the plural itemsof phoneme data stored at said storage step; a first penalty assigningstep of assigning a penalty that is based upon an attribute value toeach item of phoneme data retrieved at said retrieval step where apenalty is assigned using power and phoneme duration of each item ofphoneme data as the attribute value; and a selection step of selecting,from the phoneme data retrieved at said retrieval step, and based uponthe penalty assigned at said penalty assigning step, phoneme dataemployed in synthesis of a speech waveform, wherein said first penaltyassigning step: sorts the items of phoneme data in order of decreasingpower and assigns a power-related penalty on the basis of the orderobtained by sorting, in such a manner that a small penalty is assignedto phoneme data whose power is close to an average value; and sorts theitems of phoneme data in order of decreasing phoneme duration andassigns a phoneme-duration-related penalty on the basis of the orderobtained by sorting, in such a manner that a small penalty is assignedto phoneme data whose phoneme duration is close to an average value. 18.A speech synthesizing method comprising: a storage step of storingplural items of phoneme data; a retrieval step of retrieving phonemedata, in accordance with given search retrieval conditions, from theplural items of phoneme data stored at said storage step; a firstpenalty assigning step of assigning a penalty that is based upon anattribute value to each item of phoneme data retrieved at said retrievalstep; a selection step of selecting, from the phoneme data retrieved atsaid retrieval step, and based upon the penalty assigned at said penaltyassigning step, phoneme data employed in synthesis of a speech waveform;an alternate retrieval step of retrieving phoneme data that satisfiedsome of the retrieval conditions in a case where phoneme data thatconforms to the retrieval conditions at said retrieval step does notexist; a counting step of grouping phoneme data, which has beenretrieved at said alternate retrieval step, on the basis on a phonemeenvironment, and counting the items of phoneme data on a per-groupbasis; and a second penalty assigning step of assigning a penalty on thebasis of a count obtained at said counting step to the phoneme dataretrieved at said alternate retrieval step, this penalty being assignedin addition to the penalty assigned at said first penalty assigningstep.
 19. The method according to claim 18, wherein the retrievalconditions include phoneme environment; and said alternate retrievalstep retrieves phoneme data which agrees with part of a phonemeenvironment specified in the retrieval conditions.
 20. The methodaccording to claim 19, wherein the phoneme environment specified in theretrieval conditions is a triphone compose of an applicable phoneme andphonemes on both sides thereof; an said alternate retrieval meansretrieves phoneme data for which the applicable phoneme and its leftside phoneme agree with the retrieval conditions, or phoneme data forwhich the applicable phoneme and its right side phoneme agree with theretrieval conditions.
 21. A storage medium storing a control program forcausing a computer to execute speech synthesis using phoneme data, saidcontrol program having: code of a storage step of storing plural itemsof phoneme data; code of a retrieval step of retrieving phoneme data, inaccordance with given search retrieval conditions, from the plural itemsof phoneme data stored at said storage step; code of a first penaltyassigning step of assigning a penalty that is based upon an attributevalue to each item of phoneme data retrieved at said retrieval step; andcode of a selection step of selection, from the phoneme data retrievedat said retrieval step, and based upon the penalty assigned at saidfirst penalty assigning step, phoneme data employed in synthesis of aspeech waveform, wherein the attribute values include power and phonemeduration of each item of phoneme data, and in the first penaltyassigning step, a power-related penalty is assigned in such a mannerthat a small penalty is assigned to phoneme data whose power is close toan average value of the power, and a phoneme-duration-related penalty isassigned in such a manner that a small penalty is assigned to phonemedata whose phoneme duration is close to an average value of the phonemeduration.
 22. A storage medium storing a control program for causing acomputer to execute speech synthesis using phoneme data, said controlprogram having: code of a storage step of storing plural items ofphoneme data; code of a retrieval step of retrieving phoneme data, inaccordance with given search retrieval conditions, from the plural itemsof phoneme data stored at said storage step; code of a first penaltyassigning step of assigning a penalty that is based upon an attributevalue to each item of phoneme data retrieved at said retrieval step;code of a selection step of selection, from the phoneme data retrievedat said retrieval step, and based upon the penalty assigned at saidfirst penalty assigning step, phoneme data employed in synthesis of aspeech waveform; code of an alternate retrieval step of retrievingphoneme data that satisfies some of the conditions in a case wherephoneme data that conforms to the retrieval conditions at said retrievalstep does not exist; code of a counting step of grouping phoneme data,which has been retrieved at said alternate retrieval step, on the basisof a phoneme environment, and counting the items of phoneme data on aper-group basis; and code of a second penalty assigning step ofassigning a penalty on the basis of a count obtained at said countingstep to the phoneme data retrieved at said alternate retrieval step,this penalty being assigned in addition to the penalty assigned at saidfirst penalty assigning step.
 23. A speech synthesizing apparatuscomprising: storage means for storing plural items of phoneme data,wherein each item of phoneme data includes an attribute value forphoneme environment, phoneme boundary and fundamental frequency, powerand phoneme duration; retrieval means for retrieving phoneme data fromthe plural items of phoneme data stored in said storage means; penaltyassigning means for sorting the phoneme data retrieved by saidretrieving means based upon a prescribed attribute value and forassigning a penalty to each item of the phoneme data on the basis oforder obtained by sorting so that larger penalty is added to the phonemewhose order is near the smallest and biggest and smaller penalty isadded to the phonemes whose order is near the middle; and selectionmeans for selecting, from the phoneme data retrieved by said retrievalmeans, and based upon the penalty assigned by said penalty assigningmeans, phoneme data to be employed in synthesis of a speech waveform.24. A speech synthesizing method comprising: a storage step of storingplural items of phoneme data, wherein each item of phoneme data includesan attribute value for phoneme environment, phoneme boundary andfundamental frequency, power and phoneme duration; a retrieval step ofretrieving phoneme data from the plural items of phoneme data stored atsaid storage step; a penalty assigning step of sorting the phoneme dataretrieved at said retrieving step based upon a prescribed attributevalue and of assigning a penalty to each item of the phoneme data on thebasis of order obtained by sorting so that larger penalty is added tothe phoneme whose order is near the smallest and biggest and smallerpenalty is added to the phoneme whose order is near the middle; and aselection step of selecting, from the phoneme data retrieved at saidretrieval step, and based upon the penalty assigned at said penaltyassigning step, phoneme data employed in synthesis of a speech waveform.25. A storage medium storing a control program for causing a computer toexecute speech synthesis using phoneme data, said control programhaving: code of a storage step of storing plural items of phoneme data,wherein each item of phoneme data includes an attribute value forphoneme environment, phoneme boundary and fundamental frequency, powerand phoneme duration; code of a retrieval step of retrieving phonemedata from the plural items of phoneme data stored at said storage step;code of a penalty assigning step of sorting the phoneme data retrievedat said retrieving step based upon a prescribed attribute value and ofassigning a penalty to each item of the phoneme data on the basis oforder obtained by sorting so that larger penalty is added to the phonemewhose order is near the smallest and biggest and smaller penalty isadded to the phoneme whose order is near the middle; and code of aselection step of selecting, from the phoneme data retrieved at saidretrieval step, and based upon the penalty assigned at said penaltyassigning step, phoneme data employed in synthesis of a speech waveform.